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ABSTRACT 

We propose a new model-based computer-aided diagnosis 
(CAD) system for tumor detection and classification (can- 
cerous v.s. benign) in breast images. Specifically, we show 
that (x-ray, ultrasound and MRI) images can be accurately 
modeled by two-dimensional autoregressive-moving average 
(ARMA) random fields. We derive a two- stage Yule- Walker 
Least-Squares estimates of the model parameters, which are 
subsequently used as the basis for statistical inference and 
biophysical interpretation of the breast image. We use a 
k-means classifier to segment the breast image into three re- 
gions: healthy tissue, benign tumor, and cancerous tumor. 
Our simulation results on ultrasound breast images illustrate 
the power of the proposed approach. 

Index Terms — Breast cancer, two-dimensional ARMA 
models, k-means algorithm. 



1. INTRODUCTION 

Breast cancer continues to be a significant public health prob- 
lem in the United States: It is the second leading cause of fe- 
male mortality, and, disturbingly, one out of eight women in 
the United States will be diagnosed with breast cancer in her 
life time. Until the cause of this disease is fully understood, 
early detection remains the only hope to improve breast can- 
cer prognosis and treatment. Breast cancer screening modali- 
ties are mainly based on clinical examination, mammography, 
ultrasound imaging, magnetic resonance imaging (MRI), and 
core biopsy. Mammography (breast x-ray imaging) is by far 
the fastest and cheapest screening test for breast cancer. Un- 
fortunately, it is also among the most difficult of radiolog- 
ical images to interpret: mammograms are of low contrast, 
and features indicative of breast disease are often very small. 
Many studies have shown that ultrasound and MRI imaging 
techniques can help supplement mammography by detecting 
small breast cancers that may not be visible with mammog- 
raphy. However, these techniques often fail to determine if a 
detected tumor is cancerous or benign, and a biopsy may be 
recommended. Consequently, many unnecessary biopsies are 
often undertaken due to the high false positive rate. 
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Computer aided diagnosis (CAD) paradigms have re- 
cently received great attention for lesion detection and dis- 
crimination in X-ray and ultrasound breast mammograms 
[1-4]. The large amount of negative biopsies encountered 
in clinical practice could be reduced if a computer system 
was available to help the radiologists screen breast images. 
Broadly, the CAD systems proposed in the literature can 
be grouped into four major categories: geometrical [1], ar- 
tificial intelligence [2], pyramidal (or multiresolution) [3], 
and model-based techniques [4], [5]. Geometrical methods 
employ morphological and other segmentation techniques to 
extract small specks of calcium known as microcalcifications 
from breast images [1]. However, this procedure usually re- 
quires a priori knowledge of the tumor pattern characteristics. 
Moreover, these techniques also tend to rely on many stages 
of heuristics attempting to eliminate false positives. Artificial 
intelligence techniques include neural networks and fuzzy 
logic methods. The performance of these systems is tied to 
the architecture of the network and the number of training 
data. Breast cancer is a heterogeneous disease which in- 
cludes several subtypes with distinct prognosis. In particular, 
the variability associated with the appearances of the breast 
cancer, ranging from relative uniformity to complex patterns 
of bright streaks and blobs [2], makes the ANN require a 
large training data set to ensure a certain level of reliability. 
Pyramidal or multiresolution techniques refer mainly to the 
wavelet transform [3], which can be seen from a signal de- 
composition view point. Specifically, a signal is decomposed 
onto a set of the basis wavelet functions. A very appealing 
feature of the wavelet analysis is that it provides a uniform 
resolution for all the scales. Limited by the size of the basic 
wavelet function, the downside of the uniform resolution is 
uniformly poor resolution. Model-based methods include 
linear, non-linear and finite-element methods to build an ac- 
curate model of the breast [4], [5]. The model is subsequently 
used for image matching, detection, and classification [5]. 
The accuracy of the results are tied to the accuracy of the 
considered model. 

In this work, we propose a new model-based CAD sys- 
tem for tumor detection and classification. We show that (x- 
ray, ultrasound, and MRI) breast images can be accurately 



modeled by two-dimensional autoregressive moving average 
(ARMA) random fields. The model parameters, being the 
fingerprints of the image, serve as the basis for statistical in- 
ference and biophysical interpretation of the breast image. 
ARMA models are parametric representations of wide- sense 
stationary (WSS) processes with rational spectra. The Wold 
decomposition theorem states that any WSS process can be 
decomposed as the sum of a regular process, which spectrum 
is continuous, and a predictable process, which spectrum con- 
sists of impulses. Since rational spectra form a dense set in 
the class of continuous spectra, the ARMA model renders ac- 
curately the regular part of the WSS process. It is, therefore, 
surprising that very few researchers have attempted to derive 
a general ARMA representation of the breast, and use it for 
tumor detection and classification. In [5], the authors use a 
one-dimensional fractional differencing autoregressive mov- 
ing average (FARMA) process to model the ultrasound RF 
echo reflected from the breast tissue. However, by consid- 
ering separate scan lines, they do not take into account the 
two-dimensional spatial correlation between the pixels in the 
image. In [6], an autoregressive (AR) model is considered for 
improving the contrast of breast cancer lesions in ultrasound 
images. ARMA models, however, provide a more accurate 
model of a homogeneous random field than an AR model. As 
in the ID case, the 2D ARMA parameter estimation problem 
is much more difficult than its 2D AR counterpart, due to the 
non-linearity in estimating the 2D moving average (MA) pa- 
rameters. 

This paper is organized as follows: In Section[2l we define 
the 2D ARMA representation of the breast image, and derive 
a Yule- Walker Least squares estimates of its parameters [7]. 
In Section^ we use the estimated ARMA coefficients as vec- 
tor features for the k-means classifier. The simulation results, 
for ultrasound breast images showing cancerous and benign 
tumors, are shown in Section [4] Finally, in Section [5J we 
summarize our contribution and provide concluding remarks. 

2. 2D ARMA MODELING 

We represent the breast image as a 2D random field {x[n : m] , 
(n, m) G Z 2 }. We define a total order on the discrete lattice 
as follows 



(ij) < M) 
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The 2D ARMA(pi,p 2 , qi, #2) model is defined for the N\ x 
N 2 image / = {x[n, m] : < n < JVi - 1, < m < N 2 - 1} 
by the following difference equation 

Pi P2 

x[n,m]+ Q>ijx[n — i,m — j] = 

2=0 j=0 

(*,j)#(0,0) 

w[n,m]+ 5^5^ bijw[n — i, m — j], (2) 
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where {w[n, m]} is a white noise field with variance a 2 , and 
the coefficients {<%•}, {bij} are the parameters of the model. 
From Eq. ([2]), the image {x[n, m]} can be viewed as the out- 
put of the linear time-invariant causal system H(zi,z 2 ) ex- 
cited by a white noise input, where 

B( Zl ,z 2 ) EiLo E|Lp bjj z^z~ J 
n(z 1 ,z 2 ) = — : N = — ^ no 7, ( 3 ) 



withaoo = b 00 = 1. 

2.1. Yule- Walker Least-Squares Parameter Estimation 

Assume first that the noise sequence {w[n, m]} were known. 
Then the problem of estimating the parameters in the ARMA 
model (El) would be a simple input-output system parameter 
estimation problem, which could be solved by several meth- 
ods, the simplest of which is the least-squares (LS) method. 
In the LS method, we express Eq. as 



x[n, m] + </> £ [n, m]0 = w[n, m], 



(4) 



where 



0*[n, m] — [x[n, m — 1], • • • , x[n — pi, m — p 2 ], — w[n, m 
••• , -w[n -qx.m- q 2 ]}, 

and 

= [aoi, • • • , CLp lP2 ,boi, • • • , b qiq2 \ t . 

Writing Eq. © in matrix form for n = L + 1, • • • , Ni — 1, 

and m = M + 1, • • • , N 2 — 1, for some L > max(pi, qi), 
and M > max(p2, 92), gives 



x + $(9 = w. 
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where 



x = [x[L + 1, M + 1], • • • , x[N x -1,N 2 - 
w= HL + 1,M + 1],-.- ,x[N! - 1,N 2 - 

and $ is displayed below. Assume we know <l>, then we can 
obtain a least- squares estimate of the parameter vector in 
Eq. ([5]) as 

§ = -(^$)- 1 <l> £ x. (6) 

Observe that the input model noise {w[n,m]} in <l> is un- 
known. Nevertheless, it can be estimated by considering 
the noise process w[n,m] as the output of the linear filter 
H{z u z 2 ) = bIIuZ] with input x[n,m]. From Nirenberg's 
proof of the division theorem in multi-dimensional spaces [8], 
we can write the inverse ARMA filter ^i Zl,Z2 l as the infinite 

B{Zl,Z2) 

order AR filter J2^=o Ejlo Q>ijZ\ % z 2 3 - I n me ti me domain, 
we obtain 
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x[n — i, m — j] = w[n, m]. (7) 
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z[L + 1 - pi, M + 1 - p 2 ] -w[L + 1, M] 
z[L + 2 - Pl , M + 1 - p 2 ] -w[L + 2, M] 

x[7Vi - 1 - pi , N 2 - 1 - p 2 ) -w[N! - 1 , N 2 - 2] 



-^[L+l-gi,M + l-g 2 ] \ 
V[L + 2-gi,M + l-g 2 ] 



;[7Vi-l-gi,7V 2 -l-g 2 ] / 



Therefore, we can estimate {w[n, m]} by first estimating the 
AR parameters {g^ } and next obtaining {w[n, m]} by fil- 
tering {x[n, m]} as in Eq. (|7]). Since we cannot estimate 
an infinite number of (independent) parameters from a finite 
number of samples, we approximate the finite AR model by 
one of finite order, say (ifi, K 2 ). The parameters in the trun- 
cated AR model can be estimated by using a 2D extension of 
the Yule- Walker equations as follows 

r M + aijr[k-i,l-j]=cj 2 5[k,ll (8) 

2=0 j=0 
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where {r[fc, Z]} are the autocorrelation values of the field 
{x[n, m]}, computed as follows 



r[k,l] 
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r[-Jfe, -I) = r[fc, i], for (fc, Z) > (0, 0) 
r[k, -1} = r[-k, I], for (k, I) > (1, 1), 
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and S[k,l] is the 2D Kronecker delta function. Equation ^ is 
a system of linear equations that can be written in matrix form 
and solved for the coefficients a^- . Finally, the Yule- Walker 
Least-Squares algorithm is summarized below 

1. Estimate the parameters {c^j} in an AR(i^i , K 2 ) 
model of x[n, m] by the YW method in ([8]). Obtain 
an estimate of the noise field {w[n, m] } by 

K 1 K 2 

w[n,m] = x[n,m] + &ijx[n — z, m — j], 

2=0 J=0 

(i,j)^(0,0) 

for n = Kt + 1, • • • , JVi, and m = K 2 + 1, • • • , Af 2 . 

2. Replace the w[n, m] by w[n,m] computed in Step 1. 
Obtain in © with L = Ki + qi, and M = K 2 + ^ 2 . 

3. TUMOR DETECTION AND CLASSIFICATION 

The estimated ARMA parameters, {<%•}, {^ij}, are used as a 
basis for inference about the presence of a tumor and its na- 
ture: benign or cancerous. We use the k-means algorithm to 
segment the breast image into 3 classes: healthy tissue, be- 
nign tumor and cancerous tumor. Our method consists of rep- 
resenting each pixel in the image by an ARMA model whose 



parameters are estimated by using an appropriate neighbor- 
hood for the pixel. We make the assumption that all pix- 
els in the considered neighborhood belong to the same class, 
and hence, for computational efficiency, we replace the entire 
neighborhood by the vector value of the estimated ARMA pa- 
rameters. This procedure is repeated for the entire image, cre- 
ating a new block by block vector- valued image, which will 
be the input to the k-means classifier. 

4. SIMULATIONS 

Although the proposed algorithm is independent of the imag- 
ing modality of the breast, we perform our simulations on 
ultrasound images, collected from the Radiology department, 
College of Medicine at the University of Illinois at Chicago. 
Our database of cancerous images show intraductal carci- 
noma, which is the most common type of breast cancer in 
women. Intraductal carcinoma is usually discovered through 
a mammogram or an ultrasound as microcalcifications. Our 
benign tumor images show the Fibroadenoma of the breast, 
which is a benign fibroepithelial tumor characterized by pro- 
liferation of both glandular and stromal elements. 

We estimate the ARMA parameters using a window of 
size 16 x 16. The choice of the window size presents an in- 
herent trade-off between the accuracy of the representation 
and the accuracy of the classification. A large window size 
would lead a better representation of the ARMA model, but 
might include pixels from different classes. We found that for 
256 x 256 images, a 16 x 16 window size leads to a good 
segmentation performance. Figure [T] shows a cancerous im- 
age and a benign tumor image. Their ARMA representations 
are displayed in Figs. \V[b) andQJe), respectively. It is visu- 
ally clear that the ARMA model accurately represents both 
ultrasound images. Figures [Tfc) and[T{f) show the segmen- 
tation outputs of the cancerous and benign tumor images, re- 
spectively. We can observe clear delineations of the tumors 
from the healthy tissues in both cases. Using the University 
of Illinois at Chicago's database of ultrasound breast images, 
our method yields an accuracy (number of images correctly 
classified divided by the total number of images) of 82% (see 
Tabled). 



Table 1. Detection accuracy for the University of Illinois at 
Chicago's database of ultrasound breast images 



Accuracy 82% 




5. CONCLUSION 

We showed that breast images can be accurately represented 
by 2D ARM A models. Furthermore, the parameters of the 
model can be thought of as the fingerprints of the image, 
which are exploited for tumor detection and classification. 
Our simulation results show that, based on the estimated 
ARMA parameters as feature vectors, the classic k-means 
algorithm genuinely segment the breast images into healthy 
tissue, cancerous tumor and benign tumor. The end product 
is a computer-aided diagnosis (CAD) system that clinically 
portends an accurate prognosis of breast cancer. 
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ABSTRACT 

We propose a new model-based computer-aided diagnosis 
(CAD) system for tumor detection and classification (can- 
cerous v.s. benign) in breast images. Specifically, we show 
that (x-ray, ultrasound and MRI) images can be accurately 
modeled by two-dimensional autoregressive-moving average 
(ARMA) random fields. We derive a two- stage Yule- Walker 
Least-Squares estimates of the model parameters, which are 
subsequently used as the basis for statistical inference and 
biophysical interpretation of the breast image. We use a 
k-means classifier to segment the breast image into three re- 
gions: healthy tissue, benign tumor, and cancerous tumor. 
Our simulation results on ultrasound breast images illustrate 
the power of the proposed approach. 

Index Terms — Breast cancer, two-dimensional ARMA 
models, k-means algorithm. 



1. INTRODUCTION 

Breast cancer continues to be a significant public health prob- 
lem in the United States: It is the second leading cause of fe- 
male mortality, and, disturbingly, one out of eight women in 
the United States will be diagnosed with breast cancer in her 
life time. Until the cause of this disease is fully understood, 
early detection remains the only hope to improve breast can- 
cer prognosis and treatment. Breast cancer screening modali- 
ties are mainly based on clinical examination, mammography, 
ultrasound imaging, magnetic resonance imaging (MRI), and 
core biopsy. Mammography (breast x-ray imaging) is by far 
the fastest and cheapest screening test for breast cancer. Un- 
fortunately, it is also among the most difficult of radiolog- 
ical images to interpret: mammograms are of low contrast, 
and features indicative of breast disease are often very small. 
Many studies have shown that ultrasound and MRI imaging 
techniques can help supplement mammography by detecting 
small breast cancers that may not be visible with mammog- 
raphy. However, these techniques often fail to determine if a 
detected tumor is cancerous or benign, and a biopsy may be 
recommended. Consequently, many unnecessary biopsies are 
often undertaken due to the high false positive rate. 
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Computer aided diagnosis (CAD) paradigms have re- 
cently received great attention for lesion detection and dis- 
crimination in X-ray and ultrasound breast mammograms 
[?, ?, ?, ?]. The large amount of negative biopsies encountered 
in clinical practice could be reduced if a computer system 
was available to help the radiologists screen breast images. 
Broadly, the CAD systems proposed in the literature can be 
grouped into four major categories: geometrical [?], artifi- 
cial intelligence [?], pyramidal (or multiresolution) [?], and 
model-based techniques [?], [?]. Geometrical methods em- 
ploy morphological and other segmentation techniques to 
extract small specks of calcium known as microcalcifications 
from breast images [?]. However, this procedure usually re- 
quires a priori knowledge of the tumor pattern characteristics. 
Moreover, these techniques also tend to rely on many stages 
of heuristics attempting to eliminate false positives. Artificial 
intelligence techniques include neural networks and fuzzy 
logic methods. The performance of these systems is tied to 
the architecture of the network and the number of training 
data. Breast cancer is a heterogeneous disease which in- 
cludes several subtypes with distinct prognosis. In particular, 
the variability associated with the appearances of the breast 
cancer, ranging from relative uniformity to complex patterns 
of bright streaks and blobs [?], makes the ANN require a 
large training data set to ensure a certain level of reliability. 
Pyramidal or multiresolution techniques refer mainly to the 
wavelet transform [?], which can be seen from a signal de- 
composition view point. Specifically, a signal is decomposed 
onto a set of the basis wavelet functions. A very appealing 
feature of the wavelet analysis is that it provides a uniform 
resolution for all the scales. Limited by the size of the basic 
wavelet function, the downside of the uniform resolution is 
uniformly poor resolution. Model-based methods include 
linear, non-linear and finite-element methods to build an ac- 
curate model of the breast [?], [?]. The model is subsequently 
used for image matching, detection, and classification [?]. 
The accuracy of the results are tied to the accuracy of the 
considered model. 

In this work, we propose a new model-based CAD sys- 
tem for tumor detection and classification. We show that (x- 
ray, ultrasound, and MRI) breast images can be accurately 



modeled by two-dimensional autoregressive moving average 
(ARM A) random fields. The model parameters, being the 
fingerprints of the image, serve as the basis for statistical in- 
ference and biophysical interpretation of the breast image. 
ARM A models are parametric representations of wide- sense 
stationary (WSS) processes with rational spectra. The Wold 
decomposition theorem states that any WSS process can be 
decomposed as the sum of a regular process, which spectrum 
is continuous, and a predictable process, which spectrum con- 
sists of impulses. Since rational spectra form a dense set in 
the class of continuous spectra, the ARMA model renders ac- 
curately the regular part of the WSS process. It is, therefore, 
surprising that very few researchers have attempted to derive 
a general ARMA representation of the breast, and use it for 
tumor detection and classification. In [?], the authors use a 
one-dimensional fractional differencing autoregressive mov- 
ing average (FARMA) process to model the ultrasound RF 
echo reflected from the breast tissue. However, by consid- 
ering separate scan lines, they do not take into account the 
two-dimensional spatial correlation between the pixels in the 
image. In [?], an autoregressive (AR) model is considered for 
improving the contrast of breast cancer lesions in ultrasound 
images. ARMA models, however, provide a more accurate 
model of a homogeneous random field than an AR model. As 
in the ID case, the 2D ARMA parameter estimation problem 
is much more difficult than its 2D AR counterpart, due to the 
non-linearity in estimating the 2D moving average (MA) pa- 
rameters. 

This paper is organized as follows: In Section ??, we de- 
fine the 2D ARMA representation of the breast image, and 
derive a Yule- Walker Least squares estimates of its param- 
eters [?]. In Section ??, we use the estimated ARMA co- 
efficients as vector features for the k-means classifier. The 
simulation results, for ultrasound breast images showing can- 
cerous and benign tumors, are shown in Section ??. Finally, 
in Section ??, we summarize our contribution and provide 
concluding remarks. 



2. 2D ARMA MODELING 



We represent the breast image as a 2D random field {x[n, m] , 
(n, m) G Z 2 }. We define a total order on the discrete lattice 
as follows 



(hj) < (<M) ^> i < s and j < t. 



(1) 



The 2D ARMA(pi,p 2 , Qi, Q2) model is defined for the Ni x 
N 2 image / = {x[n, m] : < n < N x - 1, < m < N 2 - 1} 



by the following difference equation 

Pi Pi 

x[n,m\ + aijx[n — z, m — j] = 

i=0 j=0 

W)#(o,o) 

w[n,m] + b ijw[n-i,m- j], (2) 

i=0 j=0 
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where {w[n, m]} is a white noise field with variance a 2 , and 
the coefficients {a^}, {bij} are the parameters of the model. 
From Eq. (??), the image {x[n, m]} can be viewed as the 
output of the linear time-invariant causal system H(zi,z 2 ) 
excited by a white noise input, where 

rj ( \ B(z 1 ,z 2 ) YliLo Y^jLo bij z i %z 2 3 - 
with a o = 600 = 1. 

2.1. Yule- Walker Least-Squares Parameter Estimation 

Assume first that the noise sequence {w[n, m]} were known. 
Then the problem of estimating the parameters in the ARMA 
model (??) would be a simple input-output system parameter 
estimation problem, which could be solved by several meth- 
ods, the simplest of which is the least-squares (LS) method. 
In the LS method, we express Eq. (??) as 



x[n, m] + </> £ [n, m]6 = w[n, m], 



(4) 



where 



£ [n, m] = [x[n, m — 1], • • • , x[n — pi, m — p 2 ], — w[n, m 
••• 1 -w[n-q ll m-q 2 }} 1 



and 



Writing Eq. (??) in matrix form for n = L + 1, • • • , N± — 1, 

and m = M + 1, • • • , N 2 — 1, for some L > max(pi, qi), 
and M > max(p2, Q2), gives 



x + <I>(9 = w. 



(5) 



where 



x = [x[L + 1, M + 1], • • • , x[N! - 1, N 2 - 
w = [w[L + 1, M + 1], • • • , x[Ni - 1, N 2 - 

and ^> is displayed below. Assume we know <£, then we can 
obtain a least- squares estimate of the parameter vector in 
Eq. (??) as 

§ = -(^^>)- 1 ^x. (6) 

Observe that the input model noise {w[n,m]} in is un- 
known. Nevertheless, it can be estimated by considering 



